Pesquisa | Portal Regional da BVS

An evolutionary machine learning algorithm for cardiovascular disease risk prediction.

Ordikhani, Mohammad; Saniee Abadeh, Mohammad; Prugger, Christof; Hassannejad, Razieh; Mohammadifard, Noushin; Sarrafzadegan, Nizal.

PLoS One ; 17(7): e0271723, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35901181

RESUMO

INTRODUCTION: This study developed a novel risk assessment model to predict the occurrence of cardiovascular disease (CVD) events. It uses a Genetic Algorithm (GA) to develop an easy-to-use model with high accuracy, calibrated based on the Isfahan Cohort Study (ICS) database. METHODS: The ICS was a population-based prospective cohort study of 6,504 healthy Iranian adults aged ≥ 35 years followed for incident CVD over ten years, from 2001 to 2010. To develop a risk score, the problem of predicting CVD was solved using a well-designed GA, and finally, the results were compared with classic machine learning (ML) and statistical methods. RESULTS: A number of risk scores such as the WHO, and PARS models were utilized as the baseline for comparison due to their similar chart-based models. The Framingham and PROCAM models were also applied to the dataset, with the area under a Receiver Operating Characteristic curve (AUROC) equal to 0.633 and 0.683, respectively. However, the more complex Deep Learning model using a three-layered Convolutional Neural Network (CNN) performed best among the ML models, with an AUROC of 0.74, and the GA-based eXplanaible Persian Atherosclerotic CVD Risk Stratification (XPARS) showed higher performance compared to the statistical methods. XPARS with eight features showed an AUROC of 0.76, and the XPARS with four features, showed an AUROC of 0.72. CONCLUSION: A risk model that is extracted using GA substantially improves the prediction of CVD compared to conventional methods. It is clear, interpretable and can be a suitable replacement for conventional statistical methods.

Assuntos

Doenças Cardiovasculares , Adulto , Algoritmos , Doenças Cardiovasculares/diagnóstico , Doenças Cardiovasculares/epidemiologia , Estudos de Coortes , Humanos , Irã (Geográfico)/epidemiologia , Aprendizado de Máquina , Estudos Prospectivos

A survey on single and multi omics data mining methods in cancer data classification.

Momeni, Zahra; Hassanzadeh, Esmail; Saniee Abadeh, Mohammad; Bellazzi, Riccardo.

J Biomed Inform ; 107: 103466, 2020 07.

Artigo em Inglês | MEDLINE | ID: mdl-32525020

RESUMO

Data analytics is routinely used to support biomedical research in all areas, with particular focus on the most relevant clinical conditions, such as cancer. Bioinformatics approaches, in particular, have been used to characterize the molecular aspects of diseases. In recent years, numerous studies have been performed on cancer based upon single and multi-omics data. For example, Single-omics-based studies have employed a diverse set of data, such as gene expression, DNA methylation, or miRNA, to name only a few instances. Despite that, a significant part of literature reports studies on gene expression with microarray datasets. Single-omics data have high numbers of attributes and very low sample counts. This characteristic makes them paradigmatic of an under-sampled, small-n large-p machine learning problem. An important goal of single-omics data analysis is to find the most relevant genes, in terms of their potential use in clinics and research, in the batch of available data. This problem has been addressed in gene selection as one of the pre-processing steps in data mining. An analysis that use only one type of data (single-omics) often miss the complexity of the landscape of molecular phenomena underlying the disease. As a result, they provide limited and sometimes poorly reliable information about the disease mechanisms. Therefore, in recent years, researchers have been eager to build models that are more complex, obtaining more reliable results using multi-omics data. However, to achieve this, the most important challenge is data integration. In this paper, we provide a comprehensive overview of the challenges in single and multi-omics data analysis of cancer data, focusing on gene selection and data integration methods.

Assuntos

Genômica , Neoplasias , Biologia Computacional , Mineração de Dados , Humanos , Aprendizado de Máquina , Neoplasias/genética

DNA methylation-based age prediction using cell separation algorithm.

Jaddi, Najmeh Sadat; Saniee Abadeh, Mohammad.

Comput Biol Med ; 121: 103747, 2020 06.

Artigo em Inglês | MEDLINE | ID: mdl-32339093

RESUMO

The age of each individual can be predicted based on the alteration rule of DNA methylation with age. In this paper, an age prediction method is developed in order to solve multivariate regression problems from DNA methylation data, by optimizing the artificial neural network (ANN) model using a new proposed algorithm named the Cell Separation Algorithm (CSA). The CSA imitates cell separation action by using a differential centrifugation process involving multiple centrifugation steps and increasing the rotor speed in each step. The CSA performs similar to the centrifugal force in separating the solutions based on their objective function in different steps, with velocity increasing in each step. Firstly, 25 test functions are used to test the CSA. Secondly, the CSA is examined on three forms of age prediction problems from two body fluids (blood and saliva). The healthy blood samples, diseased blood samples and saliva samples are used to test the method's capability. The results of the CSA are compared not only with other methods proposed in previous studies, but also with the results from stochastic gradient descent (SGD), ADAM, and genetic algorithm (GA). The model results of CSA are extremely better than the four methods proposed in previous works that have not used ANN training process. The CSA also outperformed SGD, ADAM that employ the ANN model without ANN optimization by meta-heuristics. The CSA results are comparable (even superior) to the GA model which takes the advantages of both ANN and meta-heuristics.

Assuntos

Algoritmos , Metilação de DNA , Separação Celular , Redes Neurais de Computação

Brain MRI analysis using a deep learning based evolutionary approach.

Shahamat, Hossein; Saniee Abadeh, Mohammad.

Neural Netw ; 126: 218-234, 2020 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-32259762

RESUMO

Convolutional neural network (CNN) models have recently demonstrated impressive performance in medical image analysis. However, there is no clear understanding of why they perform so well, or what they have learned. In this paper, a three-dimensional convolutional neural network (3D-CNN) is employed to classify brain MRI scans into two predefined groups. In addition, a genetic algorithm based brain masking (GABM) method is proposed as a visualization technique that provides new insights into the function of the 3D-CNN. The proposed GABM method consists of two main steps. In the first step, a set of brain MRI scans is used to train the 3D-CNN. In the second step, a genetic algorithm (GA) is applied to discover knowledgeable brain regions in the MRI scans. The knowledgeable regions are those areas of the brain which the 3D-CNN has mostly used to extract important and discriminative features from them. For applying GA on the brain MRI scans, a new chromosome encoding approach is proposed. The proposed framework has been evaluated using ADNI (including 140 subjects for Alzheimer's disease classification) and ABIDE (including 1000 subjects for Autism classification) brain MRI datasets. Experimental results show a 5-fold classification accuracy of 0.85 for the ADNI dataset and 0.70 for the ABIDE dataset. The proposed GABM method has extracted 6 to 65 knowledgeable brain regions in ADNI dataset (and 15 to 75 knowledgeable brain regions in ABIDE dataset). These regions are interpreted as the segments of the brain which are mostly used by the 3D-CNN to extract features for brain disease classification. Experimental results show that besides the model interpretability, the proposed GABM method has increased final performance of the classification model in some cases with respect to model parameters.

Assuntos

Algoritmos , Doença de Alzheimer/diagnóstico por imagem , Transtorno Autístico/diagnóstico por imagem , Encéfalo/diagnóstico por imagem , Imageamento por Ressonância Magnética/métodos , Redes Neurais de Computação , Adolescente , Adulto , Doença de Alzheimer/classificação , Transtorno Autístico/classificação , Evolução Biológica , Criança , Aprendizado Profundo , Feminino , Humanos , Estudos Longitudinais , Imageamento por Ressonância Magnética/classificação , Masculino , Pessoa de Meia-Idade , Adulto Jovem

MapReduce-Based Parallel Genetic Algorithm for CpG-Site Selection in Age Prediction.

Momeni, Zahra; Saniee Abadeh, Mohammad.

Genes (Basel) ; 10(12)2019 11 25.

Artigo em Inglês | MEDLINE | ID: mdl-31775313

RESUMO

Genomic biomarkers such as DNA methylation (DNAm) are employed for age prediction. In recent years, several studies have suggested the association between changes in DNAm and its effect on human age. The high dimensional nature of this type of data significantly increases the execution time of modeling algorithms. To mitigate this problem, we propose a two-stage parallel algorithm for selection of age related CpG-sites. The algorithm first attempts to cluster the data into similar age ranges. In the next stage, a parallel genetic algorithm (PGA), based on the MapReduce paradigm (MR-based PGA), is used for selecting age-related features of each individual age range. In the proposed method, the execution of the algorithm for each age range (data parallel), the evaluation of chromosomes (task parallel) and the calculation of the fitness function (data parallel) are performed using a novel parallel framework. In this paper, we consider 16 different healthy DNAm datasets that are related to the human blood tissue and that contain the relevant age information. These datasets are combined into a single unioned set, which is in turn randomly divided into two sets of train and test data with a ratio of 7:3, respectively. We build a Gradient Boosting Regressor (GBR) model on the selected CpG-sites from the train set. To evaluate the model accuracy, we compared our results with state-of-the-art approaches that used these datasets, and observed that our method performs better on the unseen test dataset with a Mean Absolute Deviation (MAD) of 3.62 years, and a correlation (R2) of 95.96% between age and DNAm. In the train data, the MAD and R2 are 1.27 years and 99.27%, respectively. Finally, we evaluate our method in terms of the effect of parallelization in computation time. The algorithm without parallelization requires 4123 min to complete, whereas the parallelized execution on 3 computing machines having 32 processing cores each, only takes a total of 58 min. This shows that our proposed algorithm is both efficient and scalable.

Assuntos

Envelhecimento/genética , Biologia Computacional/métodos , Ilhas de CpG , Algoritmos , Metilação de DNA , Epigênese Genética , Aptidão Genética , Humanos , Modelos Genéticos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA